-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wasm -> Miden IR translation #22
Conversation
@bitwalker Following up on #18 (comment)
Sounds reasonable. I've tried to implement |
Let's use
That's only true for addition/subtraction/multiplication AFAIK, most everything else is going to need to know how to interpret the binary representation in order to have the correct semantics, but perhaps those are precisely the lines along which the distinction is drawn in Wasm (in terms of which ops have _s/_u variants, and which don't). In any case, I think the implication is that unless an op in Wasm distinguishes signed vs unsigned, we must assume that the semantics are signed - that is after all the purpose of the two's complement representation. Since Wasm uses dedicated ops to interpret a two's complement binary encoding as an plain unsigned encoding when desired, I think the equivalent approach in our IR would be to While we could take a similar tack as Wasm, preferring to represent integers in the IR as signed by default, with dedicated unsigned ops; Miden is a bit all over the place with its instruction set: field elements are signed, but have a restricted set of operations, while its u32 instruction set is purely unsigned, with a broader set of operations. We can certainly emulate signed 32-bit integers with what Miden provides, but we're not going to get much out of the box - we'll have to implement special handling for basically all ops on signed values, e.g. not just comparisons, right shift, division; but also simple things like For now at least, let's go with the |
My first thought was to use
Thank you for the detailed explanation. It clears up a lot of things for me. I'll go with the |
Yeah that documentation needs clarification now that what I'm anticipating for it's usage has been dialed in a bit better. The primary thing I was trying to convey with that is that It may turn out that definition is too broad for |
@bitwalker I pushed the Wasm globals implementation. |
Perfect, that's what I was anticipating, just wasn't 100% sure if Wasm also had the equivalent of relative addressing and operations which use it, sounds like it's much more straightforward, which keeps things nice and simple. |
@bitwalker I'm looking at the data segments in the spec, and it seems like we need to be able to load a byte array into the memory at a predefined address before starting the execution. There could be multiple data segments, each with its offset and size. Here is how it looks in the wild. The following Rust example: #[inline(never)]
#[no_mangle]
pub fn sum_arr(arr: &[u32]) -> u32 {
arr.iter().sum()
}
#[no_mangle]
pub extern "C" fn __main() -> u32 {
sum_arr(&[1, 2, 3, 4, 5]) + sum_arr(&[6, 7, 8, 9, 10])
} Compiles into the following Wasm: (module
(type (;0;) (func (param i32 i32) (result i32)))
(type (;1;) (func (result i32)))
(func $sum_arr (;0;) (type 0) (param i32 i32) (result i32)
(local i32)
i32.const 0
local.set 2
block ;; label = @1
local.get 1
i32.eqz
br_if 0 (;@1;)
loop ;; label = @2
local.get 0
i32.load
local.get 2
i32.add
local.set 2
local.get 0
i32.const 4
i32.add
local.set 0
local.get 1
i32.const -1
i32.add
local.tee 1
br_if 0 (;@2;)
end
end
local.get 2
)
(func $__main (;1;) (type 1) (result i32)
i32.const 1048576
i32.const 5
call $sum_arr
i32.const 1048596
i32.const 5
call $sum_arr
i32.add
)
(memory (;0;) 17)
(global $__stack_pointer (;0;) (mut i32) i32.const 1048576)
(global (;1;) i32 i32.const 1048616)
(global (;2;) i32 i32.const 1048624)
(export "memory" (memory 0))
(export "sum_arr" (func $sum_arr))
(export "__main" (func $__main))
(export "__data_end" (global 1))
(export "__heap_base" (global 2))
(data $.rodata (;0;) (i32.const 1048576) "\01\00\00\00\02\00\00\00\03\00\00\00\04\00\00\00\05\00\00\00\06\00\00\00\07\00\00\00\08\00\00\00\09\00\00\00\0a\00\00\00")
) A data segment at the end of the module should be loaded into the memory at the address |
It looks like LLVM is predetermining where it's going to place all of the rodata, the main stack, and the start of the heap. They set the
So this is basically the same type of thing I had intended to do with our linker, but LLVM nicely does some of this work for us already. I think we can plan to use globals for this too, but I may add a new variant to represent a section with a specific starting address and size, which we can then take into account when laying out the heap during codegen. The rodata in your example doesn't take up a full page, but they allocate a full page to it anyway, which is different than the default behavior of our globals, but that's easy to accommodate. |
12cd75b
to
c1119e7
Compare
@bitwalker I implemented Wasm data section parsing up to the |
@bitwalker I added a test with Rust memory allocations linking to |
@bitwalker I have not found any |
633fc04
to
99c20c4
Compare
@bitwalker I finished the house cleaning, and it's ready for review. Check TODO section in the description for what's left. I can continue working on this PR, or we can create issues for them, and I will tackle them in the new PRs. |
@bitwalker I see #23 is merged. I'd like to rebase this branch on the new main branch. But I think the force push might mess with any comments you might have written as part of the review. |
@greenhat Go ahead and rebase, I haven't had a chance to get my review completed yet, but can push without messing it up on my end |
bf550b7
to
517e394
Compare
@greenhat Can you rebase this on the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just leaving a few early notes. I'm getting into the code_translator
modules next, and once done there we should be good to go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All done! In general everything looks great, as expected. I had a few questions/suggestions/nits, but nothing major to speak of.
That said, I'm marking this as requesting changes because I would really like to confirm that how the Miden IR type system is being used is correct. In particular, I32 is being used for most things (in fact, signed types are predominantly used, probably because unsigned types only recently were introduced). The rest of the compiler backend assumes that signed vs unsigned types are used appropriately, as in it doesn't allow one to freely treat signed integers as unsigned and vice versa; you must either use the correct types for the operation, or cast to the needed type, so that the semantics of each type are maintained.
If that presents an issue coming from Wasm in particular, we can perhaps rethink some aspects of this, or at least how we surface things in the IR/builders (e.g. rather than having an add
instruction, we have an add_s
vs add_u
, and let the builders manage the complexities of what to do with the types it is given.
In any case, I mostly just want to hash that out before this gets merged (we can do so on a call even, if that's easier).
All-in-all though, great work on this, I'm excited to get this merged and integrated with the other changes!
} | ||
|
||
/// Emit instructions to produce a zero value in the given type. | ||
fn emit_zero(_ty: &Type, //, mut cur: FuncCursor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a duplicate of the other emit_zero
function in the translation_utils
module, no? In any case, doesn't seem to be used since it is stubbed out
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not exactly the same, translation_utils::emit_zero
inserts at the last position but here we want to insert at the start of the block. It is needed to handle the case when the variable is used, but we cannot find its definition in the predecessors. So, we want to initialize the variable (rather than failing) at the start of the block. We need some sort of function cursor to set an arbitrary insertion point in addition to the builder interface. I could not find one. I would use it in translation_utils::emit_zero
and call it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see. The DataFlowGraph
provides insert_inst
as a primitive, which allows you to specify an InsertionPoint
, which can be either the start/end of a Block
, or before/after an Inst
, but I haven't exposed that in a very convenient way in the DefaultInstBuilder
. All of the InstBuilder
implementations actually use this under the covers, so its really just a matter of surfacing it better in one of the builders.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. I created #40
Will do! |
9988fed
to
f2230de
Compare
Cargo project in tests/rust-wasm-tests is intended to host different tests of Rust code using dlmalloc in no_std env. The test code in `lib` has dual purpose - to test Miden IR translation (via bin target) and to be used in future tests of semantic equivalence between Rust and generated Miden IR code.
…lobal_set`, fix build after rebase.
…s `call_indirect`) Switch back to using `Vec::push` in dlmalloc test. Add `panic = "abort"` in dlmalloc test Cargo.toml.
Add `InstBuilder::mem_grow`, make `OpCode::MemGrow` to return 1 result in `OpCode::results`.
…es is not stable enough
Cast mem_grow parameter to U32.
… `FunctionBuilderExt`
9ebda1b
to
f0eb478
Compare
I addressed all the notes. Besides that, there are two issues:
I'd rather make issues and fix them in the next PR. |
Awesome! I think we can get this merged now.
Sure thing, I can take care of that ASAP
I'm assuming the problem is we're trying to call a function that gets mangled. To do that, you have two options:
Fundamentally, the mangling scheme is so unstable because it includes in it the hash of the build, i.e. the same hash you find in the Unfortunately, I don't think we can de-mangle function names when compiling to Miden, since they will almost certainly violate Miden's symbol naming rules. However, we could maybe do this for modules we want to test using the emulator, since in the IR we don't have any real restrictions on naming (aside from those imposed by the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Great work on this @greenhat
Oh no, it's much simpler than that. I don't need to call the function. The problem is that
Cool hack!
I agree, demangled names will violate Miden's symbol naming rules. I'm thinking about cutting out the hash from the mangled name. It should be hidden behind the option. EDIT: I created an issue for this - #41 |
New PR in place of accidentally closed #17.
This PR adds
miden-frontend-wasm
crate that translates Wasm to Miden IR.The public API consists of the single
translate_module
function, which returns Miden IRModule
. It's integrated into the compiler driver atcompiler/mod.rs
.The diagnostics handler is passed down the stack and is used to report along with returning errors (via
unsupported_diag
macros).The parsing and validation of Wasm code are done with
wasmparser
crate.Unsupported Wasm features are reported as fatal errors. See
module_translator.rs
andsections_translator.rs
for unsupported tag, and parts of imports section and unsupported types handling. Export and table sections are ignored since rustc almost always emits them.Wasm module parsing is done in
module_translator.rs
with module sections parsing insections_translator.rs
. Wasm types conversion is done inwasm_types.rs
.The function code translation is done in
code_translator.rs
, where every Wasm instruction is translated via match.The Wasm local variables are translated to Miden SSA variables. The SSA form is constructed by
SSABuilder
inssa.rs
and is used incode_translator.rs
(seeOperator::Local*
handling) to translate Wasm locals to SSA variables.FunctionBuilderExt
is a wrapper around Miden IRFunctionBuilder
andSSABuilder
, providing additional API for dealing with mutable variables and SSA construction. Think of it asFunctionBuilder
with SSA building capabilities.FunctionBuilderExt
andSSABuilder
do not have anything Wasm specific and can be merged into Miden IRFunctionBuilder
.The SSA form construction (
ssa.rs
) and control flow ops translation code (func_translation_state.rs
, parts ofcode_translator.rs
) wereextracted from Cranelift's internal Wasm -> CLIF translator
cranelift-wasm
with minimal changes.The Wasm -> Miden IR unit tests are in
module_translator/tests.rs
(module-level, control flow ops) andcode_translator/tests.rs
(per-instruction tests).The test for the unsupported Wasm ops is in
code_translator/tests_unsupported.rs
.The Rust -> Wasm -> Miden IR integration tests are in
tests/test_rust_comp.rs
.The list of unsupported (so far) Wasm v1 ops is in
code_translator/tests_unsupported.rs
. See the list inUNSUPPORTED_WASM_V1_OPS
.I've reviewed all the Wasm v1 spec ops.
TODO:
popcnt
forctz
,clz
translation (used to unblock dlmalloc translation);